Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information
نویسندگان
چکیده
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, speech enhancement (SE) process can be guided to consider broad phonetic properties input when performing attain performance improvements. In this paper, we explore contextual information attributes as additional further benefit SE. More specifically, propose improve SE leveraging losses from an end-to-end automatic recognition (E2E-ASR) model predicts sequence classes (BPCs). We also developed multi-objective training ASR and perceptual train system based on a BPC-based E2E-ASR. Experimental results denoising, dereverberation, impaired tasks BPC improves performance. Moreover, trained E2E-ASR outperforms phoneme-based The suggest objectives misclassification phonemes may lead imperfect feedback, could potentially better choice. Finally, it is noted combining most-confusable targets into same calculating objective effectively
منابع مشابه
Automatic Syllable Segmentation Using Broad Phonetic Class Information
We propose in this paper a language-independent method for syllable segmentation. The method is based on the Sonority Sequencing Principle, by which the sonority inside a syllable increases from its boundaries towards the syllabic nucleus. The sonority function employed was derived from the posterior probabilities of a broad phonetic class recognizer, trained with data coming from an open-sourc...
متن کاملLanguage-independent Automatic Syllable Segmentation Using Broad Phonetic Class Information
We propose in this paper a language-independent method for syllable segmentation. The method is based on the Sonority Sequencing Principle, by which the sonority inside a syllable increases from its boundaries towards the syllabic nucleus. The sonority function employed was derived from the posterior probabilities of a broad phonetic class recognizer, trained with data coming from an open-sourc...
متن کاملOn Improving Face Detection Performance by Modelling Contextual Information
In this paper we present a new method to enhance object detection by removing false alarms and merging multiple detections in a principled way with few parameters. The method models the output of an object classifier which we consider as the context. A hierarchical model is built using the detection distribution around a target sub-window to discriminate between false alarms and true detections...
متن کاملBio-inspired Broad-class Phonetic Labelling
Recent studies have shown that the correct labeling of phonetic classes may help current Automatic Speech Recognition (ASR) when combined with classical parsing automata based on Hidden Markov Models (HMM). Through the present paper a method for Phonetic Class Labeling (PCL) based on bio-inspired speech processing is described. The methodology is based in the automatic detection of formants and...
متن کاملThe use of broad phonetic class models in speaker recognition
In this paper we investigate the use of broad phonetic class (BPC) models in a text independent speaker recognition task. These models can be used to bring down the variability due to the intrinsic differences between mutual phonetic classes in the speech material used for training of the speaker models. Combining BPC recognition with text independent speaker recognition moves a bit in the dire...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2023
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2023.3288418